FRACTURE mining: Mining frequently and concurrently mutating structures from historical XML documents

نویسندگان

  • Ling Chen
  • Sourav S. Bhowmick
  • Liang-Tien Chia
چکیده

In the past few years, the fast proliferation of available XML documents has stimulated a great deal of interest in discovering hidden and nontrivial knowledge from XML repositories. However, to the best of our knowledge, none of existing work on XML mining has taken into account the dynamic nature of XML documents as online information. The present article proposes a novel type of frequent pattern, namely, FRequently And Concurrently muTating substructUREs (FRACTURE ), that is mined from the evolution of an XML document. A discovered FRACTURE is a set of substructures of an XML document that frequently change together. Knowledge obtained from FRACTURE is useful in applications such as XML indexing, XML clustering etc. In order to keep the result patterns concise and explicit, we further formulate the problem of maximal FRACTURE mining. Two algorithms, which employ the level-wise and divide-and-conquer strategies respectively, are designed to mine the set of FRACTUREs. The second algorithm, which is more efficient, is also optimized to discover the set of maximal FRACTUREs. Experiments involving a wide range of synthetic and real-life datasets verify the efficiency and scalability of the developed algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequently Changing Substructures from Historical Unordered XML Documents

Recently, there is an increasing research efforts in XML data mining. These efforts largely assumed that XML documents are static. However, in many real applications, XML data are evolutionary in nature. In this paper, we focus on mining evolution patterns from historical XML documents. Specifically, we propose a novel approach to discover frequently changing structures (FCS) from a sequence of...

متن کامل

XML structural delta mining: Issues and challenges

Recently, there is an increasing research efforts in XML data mining. These research efforts largely assumed that XML documents are static. However, in reality, the documents are rarely static. In this paper, we propose a novel research problem called XML structural delta mining. The objective of XML structural delta mining is to discover knowledge by analyzing structural evolution pattern (als...

متن کامل

FASST Mining: Discovering Frequently Changing Semantic Structure from Versions of Unordered XML Documents

In this paper, we present a FASST mining approach to extract the frequently changing semantic structures (FASSTs), which are a subset of semantic substructures that change frequently, from versions of unordered XML documents. We propose a data structure, H-DOM, and a FASST mining algorithm, which incorporates the semantic issue and takes the advantage of the related domain knowledge. The distin...

متن کامل

Discovering Minimal Infrequent Structures from XML Documents

More and more data (documents) are wrapped in XML format. Mining these documents involves mining the corresponding XML structures. However, the semi-structured (tree structured) XML makes it somewhat difficult for traditional data mining algorithms to work properly. Recently, several new algorithms were proposed to mine XML documents. These algorithms mainly focus on mining frequent tree struct...

متن کامل

Mining Maximal Frequently Changing Subtree Patterns from XML Documents

Due to the dynamic nature of online information, XML documents typically evolve over time. The change of the data values or structures of an XML document may exhibit some particular patterns. In this paper, we focus on the sequence of changes to the structures of an XML document to find out which subtrees in the XML structure frequently change together, which we call Frequently Changing Subtree...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Data Knowl. Eng.

دوره 59  شماره 

صفحات  -

تاریخ انتشار 2006